AITopics | parity problem

Matching the Statistical Query Lower Bound for k -Sparse Parity Problems with Sign Stochastic Gradient Descent

Neural Information Processing SystemsMar-22-2026, 13:19:45 GMT

The $k$-sparse parity problem is a classical problem in computational complexity and algorithmic theory, serving as a key benchmark for understanding computational classes. In this paper, we solve the $k$-sparse parity problem with sign stochastic gradient descent, a variant of stochastic gradient descent (SGD) on two-layer fully-connected neural networks. We demonstrate that this approach can efficiently solve the $k$-sparse parity problem on a $d$-dimensional hypercube ($k\le O(\sqrt{d})$) with a sample complexity of $\tilde{O}(d^{k-1})$ using $2^{\Theta(k)}$ neurons, matching the established $\Omega(d^{k})$ lower bounds of Statistical Query (SQ) models. Our theoretical analysis begins by constructing a good neural network capable of correctly solving the $k$-parity problem. We then demonstrate how a trained neural network with sign SGD can effectively approximate this good network, solving the $k$-parity problem with small statistical errors. To the best of our knowledge, this is the first result that matches the SQ lower bound for solving $k$-sparse parity problem using gradient-based methods.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Matching the Statistical Query Lower Bound for k-Sparse Parity Problems with Sign Stochastic Gradient Descent

Neural Information Processing SystemsFeb-18-2026, 05:01:02 GMT

's in this subset is even or odd.

artificial intelligence, machine learning, neural network, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Social Sector (0.46)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

c50f8180ef34060ec59b75d6e1220f7a-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 01:02:23 GMT

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond Taiji Suzuki 1,2, Denny Wu

Neural Information Processing SystemsFeb-13-2026, 17:35:03 GMT

Langevin dynamics (MFLD) (Mei et al., 2018; Hu et al., 2019) is particularly attractive due to the MFLD arises from a noisy gradient descent update on the parameters, where Gaussian noise is injected to the gradient to encourage "exploration". Furthermore, uniform-in-time estimates of the particle discretization error have also been established (Suzuki et al., The goal of this work is to address the following question.

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

6cc321baf0a8611b1d1bdbd18822667b-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 23:20:37 GMT

artificial intelligence, machine learning, neural network, (15 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Matching the Statistical Query Lower Bound for k-Sparse Parity Problems with Sign Stochastic Gradient Descent

Neural Information Processing SystemsOct-10-2025, 16:52:34 GMT

's in this subset is even or odd.

neural network, neuron, probability, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Social Sector (0.46)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

c50f8180ef34060ec59b75d6e1220f7a-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 16:04:59 GMT

parity problem, sequence, transformer, (17 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Matching the Statistical Query Lower Bound for k -Sparse Parity Problems with Sign Stochastic Gradient Descent

Neural Information Processing SystemsMay-27-2025, 16:56:35 GMT

The k -sparse parity problem is a classical problem in computational complexity and algorithmic theory, serving as a key benchmark for understanding computational classes. In this paper, we solve the k -sparse parity problem with sign stochastic gradient descent, a variant of stochastic gradient descent (SGD) on two-layer fully-connected neural networks. We demonstrate that this approach can efficiently solve the k -sparse parity problem on a d -dimensional hypercube ( k\le O(\sqrt{d})) with a sample complexity of \tilde{O}(d {k-1}) using 2 {\Theta(k)} neurons, matching the established \Omega(d {k}) lower bounds of Statistical Query (SQ) models. Our theoretical analysis begins by constructing a good neural network capable of correctly solving the k -parity problem. We then demonstrate how a trained neural network with sign SGD can effectively approximate this good network, solving the k -parity problem with small statistical errors.

artificial intelligence, machine learning, parity problem, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Task Generalization With AutoRegressive Compositional Structure: Can Learning From $\d$ Tasks Generalize to $\d^{T}$ Tasks?

Abedsoltan, Amirhesam, Zhang, Huaqing, Wen, Kaiyue, Lin, Hongzhou, Zhang, Jingzhao, Belkin, Mikhail

arXiv.org Machine LearningFeb-13-2025

Large language models (LLMs) exhibit remarkable task generalization, solving tasks they were never explicitly trained on with only a few demonstrations. This raises a fundamental question: When can learning from a small set of tasks generalize to a large task family? In this paper, we investigate task generalization through the lens of AutoRegressive Compositional (ARC) structure, where each task is a composition of $T$ operations, and each operation is among a finite family of $\d$ subtasks. This yields a total class of size~$ \d^\TT $. We first show that generalization to all $ \d^\TT $ tasks is theoretically achievable by training on only $ \tilde{O}(\d) $ tasks. Empirically, we demonstrate that Transformers achieve such exponential task generalization on sparse parity functions via in-context learning (ICL) and Chain-of-Thought (CoT) reasoning. We further demonstrate this generalization in arithmetic and language translation, extending beyond parity functions.

large language model, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

2502.08991

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency

Wen, Kaiyue, Zhang, Huaqing, Lin, Hongzhou, Zhang, Jingzhao

arXiv.org Machine LearningOct-7-2024

Chain-of-thought (CoT) has proven to be a powerful technique for enhancing reasoning in large language models [29, 63]. By instructing the model to break complex problems into smaller, manageable steps, CoT facilitates more efficient reasoning and better generalization, particularly in algorithmic and logical tasks [32, 45, 60]. Building on this, performance can be further improved through multi-step prompting and multi-path sampling techniques [10, 20, 59, 74, 75]. This focus on CoT within in-context learning has since expanded to more structured learning approaches [6, 69]. By adding reasoning examples of CoT style to the instruction-tuning dataset, models enhance their problem-solving abilities more effectively than relying solely on CoT during prompting [11, 72]. As a result, CoT is now shaping a new paradigm in language model development, marking a shift from simply scaling data [22, 25] to focusing on advanced reasoning strategies [39], which leads to more effective learning outcomes. While CoT's success is well-established, understanding why it works is still a hotly debated topic [48, 51]. Recent theoretical studies suggest that CoT enhances a model's expressiveness, increasing its representational capacity when the sequence is long enough [18, 37]. However, expressivity alone does not guarantee success.

cot, parity problem, transformer, (14 more...)

arXiv.org Machine Learning

2410.05459

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.87)

Add feedback

Filters

Collaborating Authors

parity problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Matching the Statistical Query Lower Bound for k -Sparse Parity Problems with Sign Stochastic Gradient Descent

Matching the Statistical Query Lower Bound for k-Sparse Parity Problems with Sign Stochastic Gradient Descent

c50f8180ef34060ec59b75d6e1220f7a-Paper-Conference.pdf

Feature learning via mean-field Langevin dynamics: classifying sparse parities and beyond Taiji Suzuki 1,2, Denny Wu

6cc321baf0a8611b1d1bdbd18822667b-Paper-Conference.pdf

Matching the Statistical Query Lower Bound for k-Sparse Parity Problems with Sign Stochastic Gradient Descent

c50f8180ef34060ec59b75d6e1220f7a-Paper-Conference.pdf

Matching the Statistical Query Lower Bound for k -Sparse Parity Problems with Sign Stochastic Gradient Descent

Task Generalization With AutoRegressive Compositional Structure: Can Learning From $\d$ Tasks Generalize to $\d^{T}$ Tasks?

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency